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Abstract — Ever since the introduction of the internet, it has 
been void of any privacy. The majority of internet traffic currently 
is and always has been unencrypted. A number of anonymous 
communication overlay networks exist whose aim it is to provide 
privacy to its users. However, due to the nature of the internet, 
there is major difficulty in getting these networks to become 
both decentralized and anonymous. We list reasons for having 
anonymous networks, discern the problems in achieving decen- 
tralization and sum up the biggest initiatives in the field and their 
current status. To do so, we use one exemplary network, the 
Tor network. We explain how Tor works, what vulnerabilities this 
network currently has, and possible attacks that could be used 
to violate privacy and anonymity. The Tor network is used as a 
key comparison network in the main part of the report: a tabular 
overview of the major anonymous networking technologies in 
use today. 

1 Introduction 

All feelings of privacy concerning browsing the 
internet, talking on the telephone or location 
tracking of cellphones are an illusion. In recent 
years the need for privacy-enhancing technol- 
ogy has become more apparent. Revelations by 
Edward Snowden of government misconduct 
and constitutional violations have sent shock 
waves through the internet. 

We failed to make an internet that is secure 
and private. Although much research has been 
done on anonymous internet communication, 
only few systems have been actually imple- 
mented and only one is actively being used. 
One of the most important factors that impact 
anonymity in a communication system is the 
number of users. A sufficiently large number of 
users are required for a system to make guaran- 
tees about its ability to protect the privacy of its 
users. This makes it difficult to introduce a new 



anonymous internet system, due to the initial 
lack of users. To support a large number of 
users, such a network has to be decentralized. 
A lack of decentralization would otherwise re- 
sult in bottlenecks that place constraints on the 
number of users. Relying on server bandwidth 
donations has proven to be a difficult to sustain 
strategy. 

The most widely used anonymous commu- 
nication system is Tor. In this technical report 
we will analyse Tor and its semi-centralized 
nature. Tor struggles to keep up with the band- 
width demands of its users. As the number of 
users increases, the need to decentralize Tor be- 
comes more urgent. Decentralizing Tor isn't an 
easy task: After fifteen years of decentralization 
attempts, the network is still partially central- 
ized. Only few decentralized alternatives to Tor 
exist, however they lack the user base to be 
considered safe and useful. Examples include 
Gnutella II32L Freenet [19J and Tapestry. 

This technical report is structured as follow- 
ing: In section [2] we will give an introduction 
and overview of Tor. Known vulnerabilities in 
Tor are discussed in section [3] After that, we 
will talk about decentralization and its prob- 
lems in section |U The current state of decen- 
tralized internet systems is discussed in sec- 
tion [5] A comparison of existing decentralized 
networks is made in section [6j Finally, we will 
conclude and discuss our findings in section |7j 

2 INTRODUCTION TO TOR 

The implementation of The Onion Router 
(TOR) was first described in 1996 by the U.S. 



Navy Research Laboratory, as a means to pro- 
tect government communications from digital, 
as well as physical attacks, by hiding the loca- 
tion of the communicating party or parties fl5ll . 
The idea behind onion routing traces back to 
1981, where Chaum described it in his famous 
paper "Untraceable electronic mail, return ad- 
dresses, and digital pseudonyms" [9J. 

In 2002, the Tor project discontinued their 
old code base and re-implement the project 
as Tor, the Second Generation Onion Router. 
They introduced perfect forward secrecy, direc- 
tory servers, hidden services and more [10]. In 
this section, we will explain the various com- 
ponents of Tor, the structure of the network, 
circuit creation and disadvantages of Tor. 

2.1 Onion routing 

As described in the original design paper of 
The Onion Router Ifl5l , network traffic is for- 
warded through a circuit of nodes, where each 
node only knows the previous and next node 
in the circuit. With a sufficiently long circuit 
of (independent) nodes, this means that two 
communicating parties can remain oblivious of 
each others physical location. 

Say we have a circuit consisting of four 
nodes: our trusted client node (C), an (entry or 
guard) relay node (X), a (middle) relay node 
(Y) and an (exit) relay node (Z). In this case, 
there are three relay nodes, but this is not 
necessarily always the case as more middle 
relay nodes can be added. A visualization of 
this path can be seen in figure [l] Each node 
has its own public key and a corresponding 
private key. When building the circuit, our 
client generates a distinct secret for each of 
these nodes. More information about the circuit 



setup, can be found in 2.4 



The payload of each packet flowing through 
the circuit is first encrypted with the distinct se- 
cret for last node Z, then with the distinct secret 
for node Y, and last with the secret for node 
X. With each layer of encryption, a header is 
added with the address of the next node in the 
circuit, plus the used distinct secret encrypted 
with corresponding nodes public key. 

After node X receives this packet from our 
trusted node C, it decrypts the attached secret 



with its private key, and uses that secret to 
decrypt the rest of the packet. The result is a 
header with the address for node Y and the 
payload encrypted with secrets for the follow- 
ing nodes in the circuit, which is forwarded to 
node Y. 

As a node receives a packet from the previ- 
ous node, it peels off another layer of encryp- 
tion, much as how you can peel an onion layer 
for layer, and forwards it to the next node in the 
circuit (as specified in the decrypted header). 
When exit node Z decrypts the last layer, it 
forwards the payload outside the network to 
the original destination that our trusted client 
C tried to contact, acting as a traditional proxy. 

When exit node Z receives a response, this 
whole process is applied in reverse order, en- 
crypting the payload with its secret along the 
way, instead of decrypting. When our client C 
receives the packet, it peels off all the encryp- 
tion layers to retrieve the unencrypted payload. 

With the second generation onion routing 
used in Tor, a modified algorithm is used to 
derive the encryption keys, called telescoping 
path-building, which also provides perfect for- 
ward secrecy. This algorithm is described in 
section I2.4I 

2.2 Directory servers 

The original Onion Router used an unsafe, de- 
centralized node discovery mechanism called 
in-band status updates. During such a status 
update each node broadcasts known nodes to 
its neighbours. An attacker could exploit this 
to isolate and limit the knowledge of a client, 
forcing connections through malicious nodes. 
Another disadvantage is that in-band status 
updates take longer to propagate throughout 
the network and create a global consensus. 

To mitigate these concerns, directory servers 
were introduced to Tor during its reimple- 
mentation. These directory servers keep a re- 
dundant central consensus about the network. 
They act as HTTP servers to which Tor nodes 
can publish signed information about them- 
selves. Tor clients can in turn download this 
information, as seen in figure [TJ 

The information distributed by the directory 
servers is signed. The keys to verify these 
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Fig. 1: The components of the Tor network. After downloading the node list from the Directory 
Server, the user creates a circuit through a guard node, a relay node and an exit node. This 
circuit is used to communicate (anonymously) with the internet. 



signatures are preloaded in the Tor software, 
along with the list of directory servers. This 
implies trust by the Tor client in the directory 
servers. 



2.3 Relay and exit nodes 

The Tor network consists of several compo- 
nents. The clients in the Tor network are known 
as onion proxies. The software to run an onion 
proxy is available for free on the Tor website 
||37| and is easy for users to configure. The 
onion proxies are responsible for downloading 
the directory information, establishing circuits 
across the network and handling connections 
from user applications. 

The routing in the network is done by onion 
routers, also called relay nodes. The relay 
nodes relay the data from the onion proxy to 
the web server across a circuit (circuits are 



described in 2.4 1. Each onion router is con- 



nected to every other onion router with a TLS 
connection |[T7||. Each circuit has three type of 
onion routers 1120)1 : 



• The entrance Tor router: this router is di- 
rectly connected to an onion proxy and can 
observe the origin of a request through the 
Tor network. The entrance router sends the 
packet to the middle Tor router. 

• The middle Tor router: this router is con- 
nected to the entrance router and the exit 
router. 

• The exit Tor router: this router is connected 
to the web server. Note that the exit Tor 
router is the only router that can observe 
the final destination of the request. 

The first router in a circuit is the entrance 
router. The entrance router sends the data to 
one of the middle routers which forwards the 
data to the exit router. 

2.4 Circuit creation 

Data on the Tor network travels over several re- 
lay nodes before it reaches its destination. Such 
a selection of nodes is called a circuit. To ensure 
both good performance and anonymity, a path 
is chosen using a sophisticated path selection 



algorithm. This algorithm selects nodes based 
on the bandwidth of the nodes II35II . Nodes that 
have more bandwidth, have a higher probabil- 
ity to be chosen for the circuit creation. The 
same node cannot be used more than once in 
a single circuit. 

Suppose Alice is an onion proxy that wants 
to connect through the Tor network to a web 
server. Circuit creation uses the Diffie-Hellman 
key-exchange protocol lfl"6l to establish a shared 
secret between nodes. To create a new circuit, 
Alice first sends a create cell with the first half of 
the Diffie-Hellman handshake to the first node 
in her selected path (for example, ORi). ORi 
sends a created cell back with the second half of 
the key along with a hash of the final key. Now 
both Alice and OR\ have a shared key they use 
to encrypt and decrypt data sent between them. 

Alice now has a connection with the first 
onion router in the circuit. To extend the circuit 
to OR2, Alice first sends a relay extend cell to 
OR\. This cell contains the address of the next 
onion router in the circuit and the first half of 
the key to use in the communication between 
her and OR?,- OR\ takes this first half of the 
key and sends a create cell with this key to OR 2 . 
When ORi receives a created cell, ORi passes 
this cell to Alice. Now Alice and OR2 share a 
common key. The same procedure can be used 
to extend the circuit with more nodes. 

2.5 Disadvantages 

While Tor offers its users a high level of 
anonymity, there are some disadvantages using 
it. According to Dingledine et al IfTTf , there 
are six reasons why Tor is not optimal. In this 
section, we will summarize these reasons and 
explain what could be done to fix them. 

• Tor's congestion control does not work 
well. The network has some problems han- 
dling bulk transfers, such as download- 
ing large files or streaming high-quality 
videos. The congestion control could be 
improved by using an unreliable protocol 
such as UDP for links between Tor relays. 
Goldberg et al. [2] have proposed PCTCP 
which could improve the response time of 
the Tor network by 60% and the download 
time of files by 30%. 



• Some Tor users put more traffic on the net- 
work than they contribute by running an 
onion router. This means that these users 
are slowing the network down as they use 
more traffic than giving back. A possible 
solution for this is to throttle certain high- 
bandwidth protocols such as BitTorrent at 
exit nodes or at onion proxies. 

• The Tor network doesn't have the capacity 
to handle all the users that want privacy on 
the internet. According to the Tor Metrics 
project 1136 L it takes about 6 seconds to 
download 1 MiB of data. 

Due to the fact that traffic travels over 
multiple Tor nodes, the total amount of 
transferred data in the network multiplies. 
This is illustrated in figure |2j Normally, 
when 1 GiB is transferred over the internet, 
it has a network cost of 2 GiB^| By having 
n hops, the amount of transferred traffic 
would be multiplied by 2{n + 1). As Tor 
uses 3 hops by default, this means that a 
1 GiB transfer would result in a network 
cost of 8 GiB. By increasing the amount of 
onion routing nodes in the Tor network, 
the capacity is increased. Incentives such 
as LIRA [18J could make more users run an 
onion router, thus increasing the capacity 
of the network and making it faster. 

• The current path selection algorithm of Tor 
doesn't distribute the load evenly over the 
network. The problem is that the current 
selection strategy is optimal when the net- 
work is fully loaded. This is not always 
the case. Using a better path selection al- 
gorithm could increase the capacity of the 
network and the overall user experience. 

• Tor clients are not optimal at handling 
latency and connection failures. For ex- 
ample, if extending a circuit fails, the en- 
tire circuit is abandoned. An improvement 
would be to first try to extend the circuit to 
some other places. If that fails, the circuit 
could be abandoned. Also, a better timeout 
mechanism could be chosen for building 
circuits. 

• Much of the overhead of the network is 

1. 1 GiB upload by the sender and 1 GiB download by the 
receiver, or 2 GiB in total. 
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Fig. 2: As traffic moves over Tor nodes, the total amount of bandwidth used in the network 
increases. By using n hops, the total amount of network traffic would be multiplied by 2(n + 1). 



in downloading the directory information. 
There is also overhead in the TLS connec- 
tion between the nodes in the network. 
According to Dingledine et al, removing 
the empty TLS application record could 
reduce the overhead in the TCP/IP header 
by 6.3%. 

The directory service generates overhead on 
the network. Replacing this central authority 
with a decentralized component, could reduce 
the overhead of the network and improve per- 
formance. In conclusion, this means that we 
would like to see Tor decentralized. Although 
much research has been done on the decentral- 
ization of Tor, it still uses centralized compo- 
nents today. 

2.6 Tor stinks? 

Tor is not only used by human rights activists. 
It is also used by distributors of illegal content 
and providers of illegal services, because it is 
deemed untraceable. This serious problem is 
disrupting Tor's public image. A recent ex- 
ample is the shutdown of Silk Road, which is 
an online market for trading prohibited sub- 
stances and other illegal goods [|26l|. The market 
operated as a hidden Tor service. These services 
are accessed through an onion address and not 
an IP-address, hiding the physical location of 
the service. This makes it very hard for agen- 
cies to track and shut down these operations. 

A recent research on the content and popu- 
larity of Tor's hidden services [7] has shown 
that although there are Tor hidden services 
that distribute illegal content, many hidden 
services are resources devoted to human rights, 
freedom of speech and information which is 
prohibited in some countries. It is not clear 
which type of service is more popular on Tor. 



Unless hidden services are used, content 
travels unencrypted through a Tor exit node. 
Users running a Tor exit node could be held 
responsible for distributing illegal content. The 
possibility of being seen as the originator of 
illegal content refrains users from running an 
exit node. A possible solution could be to filter 
the illegal content from the legal content. While 
content filtering could be a possible solution, it 
is an open question whether filtering is in line 
with the principles of Tor and the internet. 

3 TOR VULNERABILITIES AND ATTACKS 

Besides the disadvantages mentioned in the 
previous section, Tor also suffers from several 
vulnerabilities that can be exploited through 
attacks Q, (131, & In this section we will 
summarize some of the most well known prob- 
lems with Tor as well as define the following 
categories of attacks: browser based attacks, 
low-resource routing attacks, Sybil attacks and 
replay attacks. 

3.1 Browser based attacks 

Traffic analysis can be used to attack the 
anonymity of a user (a Tor client) browsing 
the web using Tor []]]. By misusing the exit 
policy of Tor one can reduce the time required 
to perform the analysis from 0(nk) to 0(n + k) 
where n is the number of exit nodes and k is 
the number of entry guards. 

By running an HTTP exit node and a Tor 
router that eventually will act as an entry node 
in the network, an adversary can discover the 
identity of a user. The exit node injects an invis- 
ible iframe containing some JavaScript into any 
web page that passes through it, each sending 
a unique ID to a malicious web server. Every 



ten minutes the Tor client chooses a new circuit 
and eventually an unlucky Tor client picks and 
uses the malicious entry node that was placed 
in the network. 

By performing traffic analysis to compare 
the unique IDs of the web server and the 
circuits passing trough the entry node, a user 
can be identified. Disabling JavaScript does not 
mitigate this, because a similar attack can be 
set up only using the HTML meta refresh tag. 
To increase the odds of a user choosing the 
malicious exit node, one can run the exit node 
on unpopular ports. There are usually only a 
few exit nodes running on file sharing ports, 
such as 4661 to 4666. Since Tor prefers older 
circuits, using a denial of service attack against 
the older exit nodes forces Tor into creating a 
circuit with the malicious exit node. 

The solution for the JavaScript injection at- 
tack is disabling active content systems in the 
browser. For the HTML only variant one would 
have to use HTTPS to prevent man-in-the- 
middle attacks. 

3.2 Low-resource routing attacks 

Another possible attack on the anonymity of 
Tor is the so called low-resource routing at- 
tack (5). With this attack, it is possible for 
an adversary to perform an end-to-end traffic 
analysis with minimal resources, thus compro- 
mising the anonymity Tor provides. The idea 
of this attack is that a malicious onion router 
can lie about its bandwidth, thus advertising 
a much higher bandwidth than it actually has. 
Because of the Tor path selection algorithm that 
prefers high-bandwidth nodes, the chance that 
a malicious entry and exit node are chosen is 
high. 

Once a malicious entry and exit node have 
been chosen, an analysis of the traffic can be 
done to link onion proxies with the web servers 
they communicate with. Experimental research 
in a test setting has shown that with a total of 
66 non-malicious and 6 malicious nodes, is pos- 
sible to compromise 46% of the built circuits. At 
the request of the Tor community, this attack 
hasn't been tested on the live Tor network. 
There are some extensions and improvements 
to this attack. It is possible to perform a denial 



of service attack on well used entry nodes, 
forcing Tor clients to choose a new one. This 
improves the chance that a Tor client chooses 
a malicious entry node. 

Bauer et al. proposed several solutions in 
their paper. The first one is to actually verify 
the resources of the nodes by, for example, 
measuring the bandwidth and/ or uptime of a 
node. Bandwidth can be checked centralized or 
decentralized: the disadvantage of a centralized 
bandwidth check is that it generates much 
overhead on the network. With distributed 
bandwidth verification, Tor routers monitor 
each other but this is not enough to detect 
selective malicious nodes. Another solution is 
to restrict the amount of routers that can be on 
a single IP-address. The last solution proposed 
is to change the routing strategy. 

3.3 Sybil attacks 

The Sybil attack is an attack where a single 
attacker represents itself as millions of nodes 
in a peer-to-peer system. Abusing this, the 
attacker is able to propagate false assumptions 
about the network to other nodes. 

First described by Douceur [13j, he mathe- 
matically proves that this attack is always pos- 
sible without a central authority that certifies 
the participating nodes in one way or another. 
The exception to this rule is what he calls "ex- 
treme and unrealistic assumptions of resource 
parity and coordination among entities", or 
in other words: require all participants to do 
something expensive (in terms of resources) to 
identify themselves. This must be done within 
a small enough time frame, so that an attacker 
can't do this in sequence, but all nodes must 
do them in parallel. 

A fully distributed network that implements 
such a solution is the Bitcoin network [23J, in 
which computing power, and not the number 
of nodes is important for the general network 
consensus. 

3.4 Replay attacks 

A replay attack ||28H happens when a malicious 
entry node duplicates cells and sends them 
again. Since Tor uses the counter mode of 
Advanced Encryption Standard (AES-CTR) for 



encryption and decryption, the counter will be 
wrong when the duplicated package arrives 
causing the circuit to be destroyed. 

Using this, an accomplice exit router can, in 
cooperation with the entry node, discover the 
sender and receiver's relationship. This attack 
can also be used as a denial of service attack. 
According to Pries et al., defending against this 
attack is quite challenging and requires further 
research. 



4 Problems with decentralizing 

Decentralization is a difficult research prob- 
lem. A trusted central authority simplifies boot- 
strapping, key management and user reputa- 
tion. If one of these authorities were taken 
control of, anonymity could be compromised. 
When decentralizing a central authority, its 
functionality needs to be dispersed across the 
peers in the network. In this section we will 
explain the various problems involving Tor 
decentralization. 



4.1 Incentives in decentralized systems 

Tax evasion and environmental pollution can 
be seen as forms of free-riding, a phenomenon 
that is prominent in Tor. People predominantly 
use more Tor bandwidth than they donate, 
see section 



2.5 There are multiple proposals 



[12], ||T8| to introduce incentives into Tor, all 
have failed. If one would also have to build an 
incentive system into a decentralized system, 
they would have to find a way to manage the 
ratings of each client in the network, in such 
a way that they cannot be falsely modified. 
In other words, the reputation data has to be 
accurate and reliable. Besides the integrity of 
this data, the traffic it generates on the network 
should have minimal impact on the overall 
performance. 

Rahman p0|| proposes several options to 
build incentives in a peer-to-peer network. The 
first proposal that is described is the so called 
Warm-glow Model. This model determines the 
percentage of free-rides based on the proba- 
bilistic population distribution. If the percent- 
age is above a certain threshold, the system will 
show signs of diminishing marginal returns. 



The second proposal is using monetary 
schemes. While it is not exemplified a lot, the 
main idea is to use a virtual currency as incen- 
tive. The problems with this approach are the 
scalability and the hidden costs of this service. 

The third proposal is Reciprocity-Based 
Schemes. Using this approach, a peer main- 
tains a behaviour history of other peers in 
the network. These schemes can be based on 
two somewhat reciprocities: direct reciprocity 
or indirect reciprocity. The former are more 
suitable for longer relationships between peers. 
The latter is more scalable but they rely on third 
party and must handle trust issues themselves. 

4.2 NAT traversal 

A truly decentralized system requires the par- 
ticipating nodes to have direct connection to 
each other. Because of the limited availability of 
IPv4 network addresses, most consumer grade 
internet connections only provide one network 
address per subscriber, snared by all the de- 
vices connected to the subscribers network us- 
ing Network Address Translation (NAT). With 
IPv4 network addresses getting more scarce, 
some Internet Service Providers even put more 
than one subscribers behind a single network 
address, using a carrier-grade NAT. 

A NAT-based system works by creating a 
local, private network which a NAT-enabled 
router connects to the internet. The local net- 
work uses network addresses from the private 
ranges (e.g. 10.0.0.0/8 or 192.168.0.0/16). When 
a local device sends a packet to the internet, 
the router replaces these private addresses, in- 
cluding the source ports, with its own public 
address before forwarding it to the internet. It 
saves these translations in a local table. Once 
it receives packets, it looks in this table and 
replaces the public network address and port 
with the associated private network address 
and port. If no corresponding entry exists in 
the routers table, the packet is dropped. This 
means that when contacting a device behind a 
NAT, there must be an existing entry in this 
table. 

There are several techniques to add an entry 
to the routers NAT table. Universal Plug and 
Play (UPnP) is one of those techniques, where 



the local device uses an HTTP request to the 
router to associate a port with the devices 
private network address. This technique is not 
available everywhere and sometimes consid- 
ered a security risk. Different implementations 
of NAT require different techniques, such as 
hole punching, relaying or reversal, as de- 
scribed by Wocker et al. Il34l . 

4.3 Bootstrapping new nodes 

If a Tor user decides to donate some of his 
bandwidth by running a bridge or a relay 
and thus creating a new node in the network, 
there has to be a starting point where this new 
node can discover neighbours in the network 
to connect with. In Tor, a directory server can 
tell the new node what his neighbours are and 
where to find them fT0|. 

Moving this system to a peer-to-peer base is 
difficult: Dingledine et al. stated that this is in- 
deed still an open problem. With decentralized 
systems, there is no central directory server to 
tell a new node where to locate neighbours. 
Some systems such as Tarzan, MorphMix and 
Pastry [33], ||3~T1 are decentralized but they do 
suffer from performance issues. 

4.4 Key exchange 

With a decentralized network, using a central- 
ized authority for managing the keys is not 
possible. This means that for secure commu- 
nication, peers have to exchange the keys di- 
rectly with each other, without a trusted party 
between them. Diffie-Hellman is a very pop- 
ular algorithm for exchanging keys between 
two parties. It is used in the circuit creation 
in Onion Routing (see section 2.4[ > for exam- 
ple. However, it is possible for an adversary 
to manipulate the keys exchanged between 
two parties, making the protocol vulnerable 
to a man-in-the-middle attack. This weakness 
makes it possible for an adversary to decrypt 
all messages sent between the two parties. 

Tor currently uses an interactive forward- 
secret key-exchange protocol called the Tor Au- 
thentication Protocol (TAP) [5j. This protocol 
uses telescoping, which means that the initiator 
negotiates session keys with each successive 
hop in the circuit. There are several proposals 



for more efficient key exchange methods. One 
of them is ACE, an one-way authenticated key 
exchange protocol. The authors of this methods 
claim to have a 46% efficiency improvement on 
the side of the client and nearly 19% on the 
side of the onion routers. ACE requires clients 
to send one extra element in the key exchange. 
This does not introduce any overhead however, 
because the element fits in the unused space in 
a cell. 

5 Decentralized 
privacy-enhancing systems 

Fully decentralized systems with large scale 
usage are without exception based on the peer- 
to-peer paradigm. Many such systems have 
been proposed, yet only some have been imple- 
mented and are currently in use and actively 
maintained E2L El, 0, E3, E3L EL lEL 
[4J. Here we focus on fully decentralized net- 
works with the exception of Torsk (which is 
almost fully decentralized but still requires a 
neighbourhood authority). 

5.1 Gnutella 

Gnutella is a decentralized peer-to-peer net- 
work used for distributed search of files. Since 
the network is fully decentralized, peers in the 
network are called servents, a combination of 
the words servers and clients. Each peer can 
act both as a server, answering queries, or as a 
client, requesting and executing search queries. 
In order for a client to bootstrap, a new peer 
connects to one of several known hosts that are 
almost always available [^J Once the peer has 
joined, there are several messages a servent can 
send out: 

• A peer sends a PING message to its neigh- 
bours to announce its presence. This mes- 
sage is forwarded to other peers and each 
peer sends a PONG message back. 

• The peer can issue QUERY answers. Other 
peers responds with a QUERY RESPONSE 
message to specify whether the file that 
was issued in the query, was found or not. 

• To transfer items between peers, the GET 
and PUSH messages are being used. 

2. These peers can be found on http:/ /gnutellahosts.com. 



Gnutella is an unstructured network which 
means that the placement of data items is 
not based on any knowledge of the network 
topology nor the contents of the file. To search 
for a file, a flooding algorithm is used. 

5.2 Freenet 

In Freenet, each data item is represented by 
a key that is independent of the location of 
the file. Freenet is called a loosely structured 
network because of this. To issue a query, the 
request is passed from client to client where 
each client makes a decision about the location 
to send the request next. 

There are three types of file keys in Freenet: 
the first one is called the Keyword-Signed Key 
(KSK) which is derived from a short descrip- 
tion of the file. Another key is the Signed- 
Subspace key (SSK) which enables personal 
namespaces. This key contains a public and a 
private key. The private key is used to store the 
data and the public key is used in the queries 
for the file. The third type of key is the Content- 
Hash Key (CHK) which is used for updating 
and splitting of contents. 

The routing algorithm for storing and re- 
trieving data is dynamic and can adjust to the 
topology of the network. Each peer only has 
knowledge about his neighbours. Each request 
has a Hops-To-Live timer which indicates how 
many peers the request may traverse. Each 
peer decrements the timer by one and when 
the timer reaches zero, the request isn't for- 
warded any more. Results of queries are being 
cached in intermediate nodes to reduce the 
time for a query response. To prevent looping 
of the requests, each request contains a random 
identifier. The peers that the request travels 
through, keep track of these identifiers and 
rejects the request if the request has already 
been answered by the peer. 

5.3 Tapestry 

Tapestry is based on the Plaxton mesh data 
structure, which maintains pointers to nodes in 
the network whose IDs match the elements of 
a tree-like structure of ID prefixes up to a digit 
position. A property of Tapestry is that it offers 
load distribution and routing locality. 



Like Gnutella, peers can take the role of a 
client, issuing requests, and the role of a server 
where objects are stored. A peer can also func- 
tion as a router, which forwards an incoming 
message. The routing algorithm is based on 
the destination ID of the packet. Routers are 
using local routing maps to route messages to 
the destination ID digit by digit. The routing 
system ensures that each peer in the system 
can be found in a logarithmic amount of hops. 

Tapestry is a fundamental component of 
OceanStore, a decentralized storage system. 
Tapestry is also used in systems such as Bayeux 
and SpamWatch, a decentralized spam-filtering 
system. 

5.4 Pastry 

Pastry is very similar to Tapestry, but there are 
some small differences. One of these differences 
is the handling of network locality and data 
object replication. Pastry also uses the Plaxton 
mesh data structure for the routing algorithm. 
Each peer in the network gets assigned a ran- 
dom 128-bit identifier that is uniformly sam- 
pled from the key space. Each node can be 
found in about log(ra) steps. 

The Pastry overlay network is used in several 
applications, such as Scribe, Squirrel and PAST. 
Scribe is a system that has been built to send 
multicast messages. Instead of relying on the 
multicast infrastructure, multicast messages are 
sent using only unicast services. Pastry is used 
to create and manage multicast groups. Scribe 
makes use of the organization, robustness and 
reliability of the Pastry network. 

Squirrel is a decentralized peer-to-peer web 
cache. The network uses Pastry to locate its 
objects and for the routing algorithm. Squirrel 
allows users to share its web cache with other 
users in the network, creating a large decentral- 
ized web cache. Squirrel however introduces 
some overhead when searching the cache. The 
challenge is to keep this overhead as low as 
possible. 

PAST is a large scale persistent peer-to-peer 
network that has been designed to store files. 
It is built upon the Pastry network and the 
main focus of PAST is providing performance, 
scalability and security. 



5.5 MorphMix 

MorphMix H3TI1 functions similar to both Tor 
and MIX networks. It relies on nested encryp- 
tion and routing traffic over multiple nodes to 
ensure anonymity of its users' communication. 
Additionally, MorphMix uses the typical be- 
haviour of a MIX network, where it reorders 
messages that enter a node before sending 
them out. 

MIX networks are typically high latency and 
traffic in it moves slow. Messages will have to 
be stored on a node until enough messages 
have arrived to start sending them out in a 
random order. Often, cover traffic is used to 
generate enough messages for the network to 
obfuscate the real communication, which in 
turn generates a lot of bandwidth overhead. 
MorphMix has been developed as a low latency 
and high performance network. As such, it will 
not hold messages for very long, nor will it use 
cover traffic. 

In contrast to Tor, MorphMix does not fea- 
ture a centralized node discovery mechanism. 
Instead, every node is free to chose a set of 
next nodes that will be the continuation of an 
anonymous tunnel. A malicious node could se- 
lect a colluding node to continue the tunnel and 
therefore control the entire tunnel. To prevent 
this, a witness node is appointed to mediate the 
setup of an anonymous tunnel. Although this 
makes the possibility of such an attack more 
difficult, it doesn't make it impossible. 

An important part of MorphMix is the ability 
to detect malicious tunnels. Tunnels that are set 
up by well behaving nodes will select the next 
nodes in the tunnel randomly. Malicious col- 
luding nodes, however, will specifically select 
nodes that are part of the malicious network 
to continue the tunnel. This will reveal itself 
in the fact that the probability of the selection 
of certain nodes is increased. By using this 
information it becomes possible, to an extent, 
to detect malicious tunnels. 

5.6 AP3 

AP3 (Anonymizing Peer-to-Peer Proxy) |22| 
makes cooperative, decentralized anonymous 
communication possible. The AP3 system pro- 
vides clients with three primitives: anonymous 



message delivery, anonymous channels and 
secure pseudonyms. Users are able to send 
and receive unicast, multicast and anycast mes- 
sages anonymously. The strategy that AP3 is 
using for message delivery is similar to that 
of Tarzan: it relies on a network of peers to 
forward messages. A node along the request 
path, does not know whether the node from 
which it receives a message is the message's 
originator or simply another forwarding peer. 

5.7 Tarzan 

Tarzan |fT4| is a fully distributed peer-to-peer 
anonymity network. It implements a network 
address translator (NAT) to bridge between 
nodes running Tarzan and the internet. This 
means that services don't have to be aware of 
the fact they are running through Tarzan. 

Tarzan requires knowledge of a few exist- 
ing nodes to bootstrap and uses a gossiping 
protocol to discover other nodes. Nambiar et 
al. showed however that this does not scale 
beyond roughly 10,000 nodes ||24l 

Once Tarzan has knowledge of enough 
nodes, it achieves its anonymity with much like 
a Chaumian mix, with layered encryption and 
routing through multiple hops. In contrast to 
Tor and other networks, Tarzan uses cover traf- 
fic to provide protection against traffic analysis 
by a global advisory to find an initiator. 

5.8 Tribler 

Tribler is a social-based peer-to-peer file shar- 
ing system backwards compatible with the Bit- 
Torrent protocol [27J. Tribler considers social 
phenomena and the sense of community as 
important parts of file sharing. Although the 
system is completely decentralized, it does not 
yet provide its users with anonymity. However, 
a modified Tor-like protocol for decentralized 
use is currently in beta. 

Tribler introduces a novel protocol called 
BuddyCast. Peer and content discovery use 
this protocol, which disseminates information 
epidemically. Additionally, the protocol allows 
users to find taste buddies, which are peers 
that share similar interests in files. This enables 
quick finding of content that a user is interested 
in and builds on the idea of social phenomena. 



Although not an anonymization network, 
Tribler does accomplish a lot on the topic of 
decentralization. The BuddyCast protocol uses 
BitTorrent infohashes to spread information 
completely decentralized throughout the net- 
work. With future work on anonymization, this 
can be a promising approach to anonymous file 
sharing. 

5.9 NISAN 

NISAN, or Network Information Service for 
Anonymization Networks [3], is an anonymiza- 
tion network which implements a distributed 
node discovery. Not only does a central node 
administrator (the directory server in Tor) im- 
ply trust in those servers, Panchenko et al. also 
argue that a central node administration (the 
directory server in Tor) does not scale. The cur- 
rent directory server protocol was already im- 
proved two times to reduce bandwidth costs, 
with a fairly low amount of users. 

NISAN implements a DHT-based approach 
(Kademlia) for distributing node information, 
in such a way that does not require the client to 
know about all the nodes in the network (such 
as in Tor). To build a circuit, NISAN generates 
random IDs and searches for the closest hit 
throughout the network. This makes it possible 
to build a path with nodes picked in a random, 
uniform way among all nodes, without the 
trust of a third party. 

This does not fully protect against finger- 
printing or bridging attacks (passive attacks), 
and suggest to do random walks throughout 
the network to mitigate that. The authors admit 
however this decreases the protection against 
an active attack. 

5.10 Torsk 

Torsk [|2T| is an extension to Tor, designed to 
be an interoperable replacement for the circuit 
creation and directory servers as used by Tor. 
The authors argue that the current directory 
servers do not scale, with the percentage of 
the traffic in a network dedicated to node dis- 
covery growing as the number of nodes grow. 
With the 2009 version of Tor, they argue that 
100% of the networks traffic would consist of 



node discovery traffic with roughly 1.2 million 
clients. 

Instead of the directory servers, it uses a 
DHT and a new neighbourhood authority. The 
DHT is a combination of Kademlia DHT and 
Myrmic DHT. Kademlia DHT was chosen be- 
cause it is already widely used and it has 
proven itself for a large number of users. 

The Myrmic DHT runs on top of Kademlia, 
and introduces the neighbourhood authority. 
This authority issues certificates to nodes that 
participate in the DHT, but it does not par- 
ticipate in the DHT itself. The neighbourhood 
authority makes this solution not a fully decen- 
tralized one, but its role is a lot smaller than the 
current directory servers. This does not solve 
the trust issue, but it does solve the scalability 
issue. 

5.11 Comparison 

Using the properties of each of the previously 
described networks, we can now draw a com- 
parison between these networks. This is done 
in the form of a tabular overview, see table [l] 
The networks are compared according to the 
following features: 

• Compatibility with Tor: Is the network 
compatible with Tor? Could the network 
or some features of the network be used 
for the decentralization of Tor? 

• Public implementation: Does a publicly 
available implementation exist? 

• Used in practice: Is the network used in 
practice? 

• Attack resistance: What weaknesses does 
the network have and which attacks are 
possible? 

• Unlinkability: Does the network hide the 
identities of the sender and /or receiver? 

6 The documented struggle of 
alternative internet projects 

The largest repository of decentralization at- 
tempts is located at redecentralize.org. The aim 
of this repository is to 'get decentralized prod- 
ucts into the hands of billions' 112911 . One of the 
ways they do this, is by maintaining a Github 
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Tapestry 
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MorphMix 
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AP3 


2004 
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X 
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? 


7 


7 


/ 


Tarzan 


2002 


X 


X 


X 


? 


/ 


7 


/ 


Tribler 


2008 


X 


/ 


/ 


/ 


X 


X 


X 


MS AN 


2009 


X 


X 


X 


7 


7 


/ 


/ 


Torsk 


2009 


/ 


X 


X 


/ 


/ 


/ 


/ 



TABLE 1: A comparison of decentralized peer- 
to-peer overlay networks. 

repository^ with projects that in some way help 
to decentralize the internet. The struggle and 
pains of these projects illustrates the difficulty 
of decentralization. No projects succeeded in 
creating an alternative internet infrastructure. 
The projects range from self hosted cloud ap- 
plications, to crypto currencies, to anonymous 
networks. 

We made a significant contribution to this 
project index. We created a table containing 
each of the listed systems that shows statistics 
such as the total lines of code (LOC), age, 
number of contributors and commits, activity 
and (main) programming language, sourced 
from Ohloh. This should make it easier to filter 
out poorly maintained or otherwise deprecated 
projects. The table is available on the same 
Github repository. 

An excerpt of the table is included as table |2j 
In this instance, we chose to sort the table on 
the number of commits, because we found that 
this most accurately represents both the matu- 
rity and activity of the projects. Other statistics 
such as LOC might not be very relevant on 
their own, because some projects include big 
libraries or other projects in their repository. 

In the table we notice that three of the 
entries are privacy-enhancing networks that 

3. Found at https://github.com/redecentralize/alternative- 
internet 



aim to provide unlinkability (Freenet, Tor and 
GNUnet), plus one that is currently in the pro- 
cess of implementing such a feature (Tribler). 
Furthermore we notice that, once above a cer- 
tain threshold, none of the statistics have a clear 
effect on a projects popularity. For example, 
Freenet and Tor have similar statistics, while 
the number of Tor users [36J is several orders of 
magnitude higher than the number of Freenet 
users 0. 

So although clear differences aren't directly 
noticeable, we hope that this comparison will 
provide more insight into which systems are 
more serious and mature than others, while 
also showing which ones are still actively main- 
tained. 

7 Conclusion 

We explained how the leading privacy- 
enhancing technology Tor works and which 
components define a Tor network. Further- 
more, we looked at the main disadvantages 
and problems Tor is currently facing. We in- 
vestigated the issues around decentralization 
and compared systems that currently have a 
decentralized structure and /or mechanism. 

From table [IJ we conclude that there is no 
fully decentralized system capable of offering 
Tor anonymity today. Decentralized systems 
that do exist such as Tarzan, I2P, Torsk or 
Gnutella, show promising attempts to decen- 
tralize and anonymize the internet. Yet each of 
these systems either lacks in performance or is 
vulnerable to some type of attack. 

For the first time we document in detail, 
the amount of wasted effort and pain spent 
in decentralization. The current generation of 
technology lead by Tor still has room for im- 
provement, while the next generation is only 
just appearing on the horizon. The major prob- 
lems involving decentralization are excruciat- 
ingly difficult to overcome. None of the projects 
have succeeded in making the internet secure 
and private. 
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